Method for Establishing High Resilient Active Recovery for BGP Route Reflectors

ABSTRACT

A recovery route reflector with a monitoring module and a BGP state establishment module is peered with a plurality of primary route reflectors. Each of the plurality of primary route reflectors is peered with a set of provider edge devices. The BGP state between the recovery route reflector and the plurality of primary route reflectors is periodically monitored. When a primary route reflector fails the BGP state between the recovery route reflector and the failed primary route reflectors is idle, and the recovery route reflector establishes a peer session with the provider edge devices that had been peered with the failed route reflector.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims priority to, U.S.patent application Ser. No. 16/984,530, filed Aug. 4, 2020, which is acontinuation of, and claims priority to, U.S. patent application Ser.No. 16/418,353, filed May 21, 2019, now U.S. Pat. No. 10,764,120, issuedSep. 1, 2020, the entire contents of all of which are herebyincorporated herein by reference.

TECHNICAL FIELD

The present disclosure relates to communicating network routinginformation. More particularly, the disclosure relates to a method,system, and computer program for establishing high resilient activerecovery.

BACKGROUND

Border Gateway Protocol (BGP) is the routing protocol of the Internet.It is used to exchange routing information between Autonomous Systems(AS) and routing traffic across the Internet. Forwarding BGP updateswithin an AS introduces a couple of challenges. First, BGP requires aBGP router to add its own AS number (ASN) entry to the AS_PATH attributewhen forwarding BGP route updates to another AS. The AS_Path attributeidentifies the ASes through which an UPDATE message has passed and listsin reverse order the ASes traversed by a prefix, with the last AS placedat the beginning of the list. The primary purpose of AS_PATH is toprovide loop-prevention during inter-AS routing. Second, to avoidrouting loops, BGP drops a route if a BGP router sees its own ASN in theAS_PATH list. Thus, when forwarding a BGP route advertisement throughthe routers within an AS, each BGP edge router will add its own ASN tothe AS_PATH list. But the next hop BGP router, which is in the same AS,sees its own ASN in the AS_PATH list, assumes that a loop has occurredand drops the route. Although this can be overcome by redistributing allBGP routes into an interior gateway protocol (IGP), and not using BGP,the large number of routes advertised by BGP can cause IGP to crash. Toavoid this an internal BGP (iBGP) is used to forward routeadvertisements received from an external BGP router through the internalnetwork. With iBGP, a router within an AS does not exchange routingupdates to another iBGP router. The ASN is added and routes areadvertised only when they are being sent to a BGP router in anotherautonomous system, i.e. to an eBGP router. However, because routingupdates learned are not advertised to other iBGP peers to prevent loops,route reachability must be achieved by using a full-mesh topologybetween all the iBGP peers. This means that every device within an AS islogically connected to every other device through a peeringrelationship.

Deploying iBGP full-mesh topology can cause scalability issues in largenetworks. To exchange routing updates with all the other BGP routers inthe full-mesh, each peering router uses up network resources.Additionally, to add new iBGP router network engineers must establish aconnection to every other BGP router within the AS. This requiresconfiguration changes on backbone routers, which results in networkdowntime. These problems may be avoided through the use of a RouteReflector (RR).

An RR is an iBGP feature that eliminates the need for a BGP full-meshtopology and allows iBGP to scale in large networks. The RR mechanismallows a iBGP router to act as a RR that advertises (reflects) theroutes it learns from one iBGP router to other iBGP peers within the AS.

The internal peers that connect to an RR are classified as RR clientpeers. An RR along with its client peers form a cluster. Each clustercan have multiple RRs which helps avoid a single point of failure andachieve redundancy. It is also possible to have multiple RRs within anAS where each RR is a non-client peer to another RR.

RRs are critical components in large network functionality to supporthigh scale. Large complex topologies require a large number of RRs torun the network and provide some protection from outage events.Traditionally, RRs were deployed in pairs to support some failoverfunctions however dual-failures would result in service outages. Toimprove the resiliency more RRs can be deployed but this comes with ahigher cost and functional challenge that drives up scale throughout thetopology. Additionally, in virtual environment servers are taken out ofservice for planned maintenance frequently resulting in ongoing stateswhere there is only one RR functioning. This exposes the network to morefrequents service interruptions.

SUMMARY

One general aspect includes a method including: establishing a firstpeer session between a recovery RR and a first RR that has established afirst provider edge peer session with a first set of provider edgedevices. Establishing a second peer session between the recovery RR anda second RR that has established a second provider edge peer sessionwith a second set of provider edge devices. Monitoring a BGP statebetween the first RR and the recovery RR and a BGP state between thesecond RR and the recovery RR. Establishing a peer session between therecovery RR and the first set of provider edge devices when the first RRfails and the first BGP state is idle. Other embodiments of this aspectinclude corresponding computer systems, apparatus, and computer programsrecorded on one or more computer storage devices, each configured toperform the actions of the methods.

Implementations may include establishing a BGP SET in the recovery RRthat manages a first provider edge BGP state between the recovery RR andthe first set of provider edge devices. The BGP set may be a containerthat includes all neighbor configuration policies and parameters.

One general aspect includes a system including: a first RR and a secondRR where the first RR has a set of established first provider edge peersessions with a first set of provider edge devices, and the second RRhas a set of established second peer sessions with a second set ofprovider edge devices. The system includes a recovery RR with anestablished first recovery RR peer session with the first RR and anestablished second recovery RR peer session with the second RR. Therecovery RR includes a monitoring module in the that monitors a firstBGP state between the first RR and the recovery RR and a second BGPstate between the second RR and the recovery RR. The recovery RRincludes a BGP state establishment module that establishes a peersession between the recovery RR and the first set of provide edgedevices when the first BGP state is idle.

The recovery RR may include a BGP SET that manages a first provider edgeBGP state between the recovery RR and the first set of provider edgedevices. The BGP set may be a container that includes all neighborconfiguration policies and parameters.

One general aspect includes a non-transitory computer readable storagemedium storing an information processing program to cause a computer toexecute a process including: establishing a first peer session between arecovery RR and a first RR where the first RR has established a firstprovider edge peer session with a first set of provider edge devices.The process executed by the computer further includes establishing asecond peer session between the recovery RR and a second RR where thesecond RR has established a second provide edge peer session with asecond set of provider edge devices. The process executed by thecomputer also includes monitoring a first BGP state between the first RRand the recovery RR and a second BGP state the second RR and therecovery RR; and when the first BGP state is idle, establishing a peersession between the recovery RR and the first set of provider edgedevices.

In one embodiment, the non-transitory computer readable storage mediumwhere the process executed by the computer may further includesestablishing a BGP SET that manages a first provider edge BGP statebetween the recovery route reflector and the first set of provider edgedevices. In an embodiment the BGP SET is a container that includes allneighbor configuration policies and parameters.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the network environment of asystem 100 for providing active recovery for a RR.

FIG. 2 is a block diagram illustrating the network environment of asystem 100 for providing active recovery for a RR when a primary RRfails.

FIG. 3 is a flowchart of a method for providing active recovery for aRR.

FIG. 4 is a Block diagram illustrating the configuration of a recoveryRR.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS Glossary

Artifact. An artifact is one of many kinds of tangible by-productsproduced during the development of software. Some artifacts (e.g., usecases, class diagrams, and other Unified Modeling Language (UML) models,requirements and design documents) help describe the function,architecture, and design of software.

Autonomous system (AS). On the Internet, an autonomous system (AS) isthe unit of router policy, either a single network or a group ofnetworks that is controlled by a common network administrator (or groupof administrators) on behalf of a single administrative entity (such asa university, a business enterprise, or a business division). Anautonomous system is also sometimes referred to as a routing domain. Anautonomous system is assigned a globally unique number, sometimes calledan Autonomous System Number (ASN). Networks within an autonomous systemcommunicate routing information to each other using an Interior GatewayProtocol (IGP). An autonomous system shares routing information withother autonomous systems using the Border Gateway Protocol (BGP).

BGP Neighbor Sate. BGP forms a TCP session with neighbor routers calledpeers. BGP uses the Finite State Machine (FSM) to maintain a table ofall BGP peers and their operational status. The BGP session may reportin the following states: Idle; Connect; Active; OpenSent; OpenConfirm;Established. Idle is the first stage of the BGP finite state machine(FSM0. BGP detects a start event, tries to initiate a TCP connection tothe BGP peer, and also listens for a new connect from a peer router. Inthe active state BGP starts a new three-way TCP handshake. If aconnection is established an Open message is sent, the whole timer setfor four minutes, and the state moves to open sent. In the establishedstate the BGP section is established. BGP neighbors exchange routes viaupdate messages. As update and keep alive messages are received thewhole timer is reset. If the whole timer expires, an error is detectedand BGP moves the neighbor back to the idle state.

BGP SET. BGP SET is a new artifact that groups BGP neighbors into asingle logical container.

Border Gateway Protocol (BGP). BGP (Border Gateway Protocol) is protocolthat manages how packets are routed across the internet through theexchange of routing and reachability information between edge routers.BGP directs packets between autonomous systems (AS)—networks managed bya single enterprise or service provider. Traffic that is routed within asingle network AS is referred to as internal BGP, or iBGP. More often,BGP is used to connect one AS to other autonomous systems, and it isthen referred to as an external BGP, or eBGP.

Container. A container is an isolated execution environment on a Linuxhost that behaves much like a full-featured Linux installation with itsown users, file system, processes and network stack. Running anapplication inside of a container isolates it from the host and othercontainers, meaning that even when the applications inside of them arerunning as root, they cannot access or modify the files, processes,users, or other resources of the host or other containers. Containershave become popular due to the way they simplify the process ofinstalling and running an application on a Linux server. Applicationscan have a complicated web of dependencies. The newest version of anapplication may require a newer version of a dependency than isavailable for the Linux distribution, and upgrading the dependency maybreak another application running on the server. However, since acontainer simulates a Linux environment, it becomes possible to installthe dependencies in the container without causing any conflicts with thehost. In fact, it's possible to run multiple containers at the sametime, all with different versions of applications and libraries.Finally, containers are portable and can be shared across platforms.Docker, a popular container engine, has a specific format for containersto be stored in. This allows a developer to package a container with allof its dependencies, post it online and allow users to download and runthe container right away.

Customer Edge Router. A CE router (customer edge router) is a routerlocated on the customer premises that provides an Ethernet interfacebetween the customer's LAN and the provider's core network. CE routers,P (provider) routers and PE (provider edge) routers are components in anMPLS (multiprotocol label switching) architecture. Provider routers arelocated in the core of the provider or carrier's network. Provider edgerouters sit at the edge of the network. CE routers connect to PE routersand PE routers connect to other PE routers over P routers.

Loopback address. A loopback address is a type of IP address that isused to test the communication or transportation medium on a localnetwork card and/or for testing network applications. Data packets senton a loopback address are re-routed back to the originating node withoutany alteration or modification.

Provider Edge Router (PE Router). A Provider Edge router (PE router) isa router between one network service provider's area and areasadministered by other network providers.

Route Reflector. A route reflector (RR) is a network routing componentfor BGP. It offers an alternative to the logical full-mesh requirementof internal border gateway protocol (IBGP). A RR acts as a focal pointfor IBGP sessions. The purpose of the RR is concentration. Multiple BGProuters can peer with a central point, the RR—acting as a RRserver—rather than peer with every other router in a full mesh. All theother IBGP routers become RR clients.

Service-Aware Border Router (SABR). The SABR protocol is an extension ofContact Graph Routing that seeks to provide a routing solution for awide range of scenarios that include both scheduled and discoveredconnectivity. For the scheduled connectivity regime, SABR uses a‘contact plan’ provided by network management describing the currentconnectivity and future connectivity schedule. SABR then makesforwarding decisions based on an earliest-arrival-time metric wherebundles are routed over the time-varying connectivity graph. SABR useshistorical contact information and neighbor discovery to address routingover non-scheduled links.

FIG. 1 is a block diagram illustrating the network environment of asystem 100 for providing active recovery for RRs. A plurality of primaryRRs (RR1 101, RR2 103 and RR3 105) are peered with a plurality of edgedevices (PE1A 107, PE1B 109, PE2A 111, PE2B 113, PE3A 115, and PE3B117). The primary RRs may be virtual RRs. In the example illustrated inFIG. 1 only 3 primary RRs are shown. However, it is contemplated that aplurality of primary RRs (e.g. 5-50) may be used. Similarly, twoprovider edge devices are illustrated as being peered with each primaryRR, but it is contemplated that a plurality of edge devices may bepeered with each primary RR.

Primary RRs (RR1 101, RR2 103 and RR3 105) are also peered with arecovery RR 119 which may be a virtual RR. The recovery RR 119 include arecovery subsystem 120 including a monitoring module 121 and anactivation module 122. The monitoring module 121 monitors the state ofthe BGP session to determine whether the state is active/established oridle/inactive. The monitoring module 121 checks the BGP session betweenthe recovery RR 119 and the primary RRs periodically (e.g. every 5seconds) to determine whether the BGP state is established/active oridle/inactive. When the state goes from established/active toidle/inactive the activation module 122 takes further action(establishes a peer relationship between the recovery RR 119 and the PEssupported by the inactive primary RR.

The recovery RR 119 must manage a copy of the differences on all primaryRRs that it supports. Those differences are captured in a containerlabeled a BGP SET (e.g. BGP SET-1 123, BGP SET-2 124 and BGP SET-3 125).A BGP SET is a new artifact that resides in the recovery RR 119. Eachprimary RR that is protected places the unique configurations of theprimary RR including all clients (PEs) that it supports and the IPaddresses of those clients in a BGP SET. BGP SET is a new data artifactlocal to the recover RR 119. The BGP SET does not modify BGP adjacency(the establishment of a session between two BGP neighbors) or attributesdistributed over the session. The neighbor would be a client of theRoute reflector or a PE. So, the SET does not change the adjacency orthe session itself between the two neighbors. The SET is designed tocreate a session or adjacency with the neighbors defined in the SET.Each BGP SET is a container and group-level session management function.The container includes all neighbor configuration, policies andparameters such as hold/keepalive settings. Each BGP SET must include arouter ID/loopback address. Clients defined within the BGP SET mustestablish BGP peering with the BGP SET itself. Although only three BGPSETs are illustrated in FIG. 1, the recovery RR 119 may have any numberof BGP SETs (typically anywhere between 5 to 50 BGP SETs). Each SETrepresents a different primary RR configuration.

The recovery RR 119 may also access global configurations 126 which arethe configurations that are common between the primary RRs (RR1 101, RR2103 and RR3 105). Each BGP SET will inherit all global configurations(e.g. subsequent address family (SAFI, L3VPN, L2VPN), autonomous systemnumber (ASN), interior gateway protocol (IGP) and policies).

There are different categories of RRs—intra, inter and SABR. Therecovery RR 119 must be classified as one of the above (either intra,inter or SABR). The recovery RR 119 must be of the same category of theprimary RRs that it services.

The operation of the system is illustrated in FIGS. 1 and 2. As shown inFIG. 1 RR1 101 supports PE1A 107 and PE1B 109, RR2 103 supports PE2A 111and PE2B 113 and RR3 105 supports PE3A 115 and PE3B 117. When theprimary RRs are operational the BGP state between the recovery RR 119and the primary RRs (RR1 101, RR2 103 and RR3 105) are all active, andBGP SET-1 123, BGP SET-2 124 and BGP SET-3 125 are all inactive. In thatcase no connections exist between the PEs and the recovery RR 109.

FIG. 2 illustrates what happens when a primary RR (e.g. RR1 101) fails.In that case, the BGP state between the recovery RR 119 and RR1 101 isidle. The change of BGP state is the trigger used by the activationmodule 122 on the recovery RR 119 to activate a peer relationship withthe set of PEs (e.g. PE1A 107 and PE1B 109) associated with the primaryRR (RR1 101) that failed. In the event of an outage of a protected RR,the BGP SET of the affected primary RR becomes active and the Loopbackaddress of the affected peer is used to build BGP sessions to eachclient (PE) defined in the BGP SET. The activation module 122 willaccess BGP SET-1 123 which will become active. The BGP SET manages theBGP state for all neighbors defined within the SET, with the BGP statebeing either up or shutdown. The BGP SET container has all the neighbor(PE) configurations contained in it and can define all sessions aseither up (sessions are established) or shutdown (none of the sessionsare established). Unlike existing BGP sessions where the state ismanaged on a per session basis the BGP SET is managed for all BGPneighbors within the BGP SET. The recovery RR 119 maintains the BGPglobal configuration 126 which are the common configurations of theprimary RRs. The recovery RR 119 can manage multiple BGP SETs where eachset represents a client (PE) group on a protected primary RR. A BGP SETdefines the differences in configurations which is limited to theneighbor IP address for BGP session establishment. On a given routerthere may be multiple BGT SETs. For example, a router may have three BGPSETs. The difference between those BGP SETs is essentially the neighboraddress of the clients to which those BGP SETs report. So, for example aBGP SET 1 may be a BGP SET for routers 1-10. In BGP SET 2, the BGP SETsupports routers 11-20 with the PE addresses of routers 11-20. In BGPSET 3 the BGP SET supports routers 21-30 with the PE addresses ofrouters 21-30. The BGP SET also maintains the router-ID/Loopback addressfor each protected PE. When the failed primary RR returns to anestablished state the recovery router disables its BGP SET returning toactive monitoring mode.

Illustrated in FIG. 3 is a flowchart for a method 300 for establishinghigh resilient active recovery for primary RRs.

In step 301, the method 300 establishes a peer session between a firstset of PEs and a first primary RR.

In step 303, the method 300 establishes a peer session between thesecond set of PEs and a second primary RR.

In step 305, the method 300 establishes a peer session between the firstprimary RR and a recovery RR.

In step 307, the method 300 establishes a peer session between thesecond primary RR and the recovery RR.

In step 309, the method 300 establishes a BGP SET in the recovery RR formanaging a PE BGP state between the recovery RR and the first set of PEs

In step 311, the method 300 establishes a BGP SET in the recovery RR formanaging a PE BGP state between the recovery RR and the second set ofPEs

In step 313, the method 300 establishes a global configuration in therecovery RR for managing the common configurations of the first primaryRR and the second primary

RR.

In step 315, the method 300 monitors a first BGP state between the firstprimary RR and the recovery RR.

In step 317, the method 300 monitors a second BGP state between thesecond primary RR and the recovery RR.

In step 319, the method 300 establishes a peer session between therecovery RR in the first set of PEs when the first BGP state is idle asa result of the failure of the first primary RR.

FIG. 4 illustrates the configuration 300 of the recovery RR 119. Therecovery RR 119 includes global configurations 126 which are theconfigurations that are common between the primary RRs (RR1 101, RR2 103and RR3 105). The common configurations may include Subsequent AddressFamily Identifiers (SAFIs) associated with a level 2 VPN and a Level 3VPN; Autonomous system numbers (ASN); interior gateway protocol (IGP)and policies. The Recovery RR 119 also include a plurality of containers(BGP SET-1 121, BGP SET-2 123; and BGP SET-3 125).

Embodiments within the scope of the present invention may also includecomputer-readable media for carrying or having computer-executableinstructions or data structures stored thereon. Such computer-readablemedia can be any available media that can be accessed by a generalpurpose or special purpose computer. By way of example, and notlimitation, such computer-readable media can comprise RAM, ROM, EEPROM,CD-ROM or other optical disk storage, magnetic disk storage or othermagnetic storage devices, or any other medium which can be used to carryor store desired program code means in the form of computer-executableinstructions or data structures. When information is transferred orprovided over a network or another communications connection (eitherhardwired, wireless, or combination thereof) to a computer, the computerproperly views the connection as a computer-readable medium. Thus, anysuch connection is properly termed a computer-readable medium.Combinations of the above should also be included within the scope ofthe computer-readable media.

Computer-executable instructions include, for example, instructions anddata which cause a general-purpose computer, special purpose computer,or special purpose processing device to perform a certain function orgroup of functions. Computer-executable instructions also includeprogram modules that are executed by computers in stand-alone or networkenvironments. Generally, program modules include routines, programs,objects, components, and data structures, etc. that perform particulartasks or implement particular abstract data types. Computer-executableinstructions, associated data structures, and program modules representexamples of the program code means for executing steps of the methodsdisclosed herein. The particular sequence of such executableinstructions or associated data structures represents examples ofcorresponding acts for implementing the functions described in suchsteps.

Although the above description may contain specific details, they shouldnot be construed as limiting the claims in any way. Other configurationsof the described embodiments of the invention are part of the scope ofthis invention. Accordingly, the appended claims and their legalequivalents should only define the invention, rather than any specificexamples given.

What is claimed:
 1. A method comprising: establishing, by one or moreprocessors, a first peer session between a recovery route reflector anda first route reflector; monitoring, by the one or more processors, arouting protocol state between the first route reflector and therecovery route reflector; based upon a determination that the routingprotocol state is not active, establishing, by the one or moreprocessors, a second peer session between the recovery route reflectorand a first set of provider edge devices associated with the first routereflector; and establishing, by the one or more processors, a dataartifact relating to the recovery route reflector and the first set ofprovider edge devices.
 2. The method of claim 1, further comprisingestablishing, by the one or more processors, a third peer sessionbetween the recovery route reflector and a second route reflector,monitoring, by the one or more processors, a second routing protocolstate between the second route reflector and the recovery routereflector, and based upon a determination that the second routingprotocol state is not active, establishing, by the one or moreprocessors, a fourth peer session between the recovery route reflectorand a second set of provider edge devices associated with the secondroute reflector.
 3. The method of claim 1, wherein the data artifactcomprises neighbor configuration policies and parameters.
 4. The methodof claim 1, wherein the data artifact comprises a router ID.
 5. Themethod of claim 1, wherein the data artifact comprises a loopbackaddress.
 6. The method of claim 1, wherein the recovery route reflectorcomprises a virtual route reflector.
 7. The method of claim 1, whereinthe first set of provider edge devices comprises routers.
 8. A systemcomprising: one or more processors; and memory coupled with the one ormore processors, the memory storing executable instructions that whenexecuted by the one or more processors cause the one or more processorsto effectuate operations comprising: establishing a first peer sessionbetween a recovery route reflector and a first route reflector;monitoring a routing protocol state between the first route reflectorand the recovery route reflector; based upon a determination that therouting protocol state is idle, establishing a second peer sessionbetween the recovery route reflector and a first set of provider edgedevices associated with the first route reflector; and establishing anartifact for the recovery route reflector and the first set of provideredge devices.
 9. The system of claim 8, further comprising establishinga third peer session between the recovery route reflector and a secondroute reflector, monitoring a second routing protocol state between thesecond route reflector and the recovery route reflector, and based upona determination that the second routing protocol state is idle,establishing a fourth peer session between the recovery route reflectorand a second set of provider edge devices associated with the secondroute reflector.
 10. The system of claim 8, wherein the artifactcomprises neighbor configuration policies and parameters.
 11. The systemof claim 8, wherein the artifact comprises a router ID.
 12. The systemof claim 8, wherein the artifact comprises a loopback address.
 13. Thesystem of claim 8, wherein the recovery route reflector comprises avirtual route reflector.
 14. The system of claim 8, wherein the firstset of provider edge devices comprises routers.
 15. A non-transitorycomputer readable storage medium storing computer executableinstructions that when executed by a computing device cause saidcomputing device to effectuate operations comprising: establishing afirst peer session between a recovery route reflector and a first routereflector; monitoring a routing protocol state between the first routereflector and the recovery route reflector; when the routing protocolstate is idle, establishing a second peer session between the recoveryroute reflector and a first set of provider edge devices; andestablishing a data artifact for the recovery route reflector and thefirst set of provider edge devices.
 16. The non-transitory computerreadable storage medium of claim 15, further comprising establishing athird peer session between the recovery route reflector and a secondroute reflector, monitoring a second routing protocol state between thesecond route reflector and the recovery route reflector, and, when thesecond routing protocol state is idle, establishing a fourth peersession between the recovery route reflector and a second set ofprovider edge devices.
 17. The non-transitory computer readable storagemedium of claim 15, wherein the data artifact comprises neighborconfiguration policies and parameters.
 18. The non-transitory computerreadable storage medium of claim 15, wherein the data artifact comprisesa router ID.
 19. The non-transitory computer readable storage medium ofclaim 15, wherein the data artifact comprises a loopback address. 20.The non-transitory computer readable storage medium of claim 15, whereinthe recovery route reflector comprises a virtual route reflector.